Skip to content

Conversation

@mustafab0
Copy link
Contributor

@mustafab0 mustafab0 commented Feb 11, 2026

Problem

ManipulationModule had no skill layer — only low-level RPCs requiring a custom IPython client.
No way for an LLM agent to invoke pick/place/move skills through the framework.
RRT planner rejected valid joint states due to floating-point drift past URDF limits.


Solution

Added 11 @skill() methods to ManipulationModule (pick, place, move_to_pose, scan_objects, etc.).
Changed base class to SkillModule so skills auto-register with agents via autoconnect().
Created xarm-perception-agent blueprint composing xarm_perception + llm_agent + human_input.
Added detection snapshot so pick uses stable labels instead of volatile live cache.
Added limit_eps=1e-3 tolerance to RRT planner joint limit validation.
Removed ManipulationClient and run_* RPC wrappers — agent CLI replaces them.
CoordinatorClient updated to route execution through task_invoke instead of removed RPCs.


Breaking Changes

ManipulationClient deleted — use dimos run xarm-perception-agent or direct RPC.
run_pick/run_place/run_place_back/run_go_init RPCs removed from ManipulationModule.


How to Test

  1. Run dimos run coordinator-mock then dimos run xarm-perception-agent
  2. Type scan the scene — verify objects listed with 3D positions
  3. Type pick up the <object> — verify approach, grasp, retract sequence
  4. Type place it back — verify placement at original pick position
  5. Verify skills appear in dimos agentspy

closes DIM-351
closes DIM-419

@greptile-apps
Copy link

greptile-apps bot commented Feb 12, 2026

Greptile Overview

Greptile Summary

This PR adds an agent-invokable skill layer to manipulation by converting ManipulationModule to a SkillModule and introducing a set of @skill() actions (scan, pick/place, move_to_pose/joints, gripper control, go_home/init). It also adds a new xarm-perception-agent blueprint composing perception + an LLM agent + human CLI input, updates coordinator client calls to route through ControlCoordinator.task_invoke, removes the legacy ManipulationClient example, and loosens RRT joint-limit validation with a small epsilon to avoid floating-point drift rejecting valid states.

The changes fit into the existing architecture by leaning on SkillModule’s auto-registration of skills into agents via autoconnect(), while continuing to use the existing planning stack (WorldMonitor + PlannerSpec) and execution through the ControlCoordinator trajectory task API (now accessed via task_invoke).

Confidence Score: 2/5

  • This PR is not safe to merge until coordinator status polling and grasp RPC wiring issues are fixed.
  • Core new functionality (agentic pick/place) depends on coordinator task polling and grasp generation wiring. The PR currently calls a non-existent trajectory-task method (get_state) and drops TrajectoryStatus results expecting dicts, which will cause skills/CLI to report success early or never show progress. The grasp path also calls an RPC method that is neither declared nor implemented (GraspingModule.get_latest_grasps), causing runtime failures on the intended GraspGen-success path.
  • dimos/manipulation/manipulation_module.py, dimos/manipulation/control/coordinator_client.py

Important Files Changed

Filename Overview
dimos/manipulation/control/coordinator_client.py Switched trajectory status/execute/cancel to ControlCoordinator.task_invoke; current status handling expects dict but trajectory tasks return TrajectoryStatus, breaking wait_for_completion/status.
dimos/manipulation/manipulation_blueprints.py Adds xarm-perception-agent blueprint composed from xarm_perception + llm_agent + human_input and extends RobotModelConfig usage (home_joints). No correctness issues spotted in this file.
dimos/manipulation/manipulation_module.py Rebased ManipulationModule onto SkillModule and added many @Skill methods; introduced broken coordinator status polling (uses get_state) and broken GraspingModule RPC integration (calls non-existent get_latest_grasps).
dimos/manipulation/planning/examples/init.py Removes ManipulationClient export and replaces docstring with a generic planning examples blurb; no functional code remains.
dimos/manipulation/planning/examples/manipulation_client.py Deletes obsolete IPython ManipulationClient RPC helper (breaking change as described). No remaining code to review.
dimos/manipulation/planning/planners/rrt_planner.py Adds small epsilon tolerance when validating start/goal joint limits to reduce false rejections due to floating-point drift.
dimos/manipulation/planning/spec/config.py Extends RobotModelConfig with home_joints and pre_grasp_offset fields used by new skills; schema change is straightforward.
dimos/robot/all_blueprints.py Registers new xarm-perception-agent blueprint entry.
pyproject.toml Adds manipulation_module.py to largefiles ignore list; no runtime behavior changes.

Sequence Diagram

sequenceDiagram
  autonumber
  participant User as Human user
  participant HI as human_input (CLI)
  participant Agent as llm_agent
  participant Skills as SkillCoordinator
  participant Manip as ManipulationModule (SkillModule)
  participant WM as WorldMonitor
  participant Planner as RRTConnectPlanner
  participant CC as ControlCoordinator
  participant Task as Trajectory Task (JointTrajectoryController)

  User->>HI: type "pick up the cup"
  HI->>Agent: user message
  Agent->>Skills: select skill + args
  Skills->>Manip: pick(object_name)
  Manip->>WM: refresh_obstacles(min_duration)
  WM-->>Manip: obstacles added
  Manip->>Manip: _generate_grasps_for_pick()
  alt GraspingModule RPC wired
    Manip->>Manip: get_rpc_calls("GraspingModule.generate_grasps")
    Manip-->>Manip: grasp candidates (intended)
  else fallback heuristic
    Manip->>WM: list_cached_detections()
    WM-->>Manip: detections snapshot
    Manip-->>Manip: heuristic grasp pose
  end

  Manip->>Planner: plan_joint_path(start, goal)
  Planner-->>Manip: JointPath
  Manip->>Manip: generate JointTrajectory
  Manip->>CC: task_invoke(task, "execute", {trajectory})
  CC->>Task: execute(trajectory)
  Task-->>CC: accepted
  CC-->>Manip: result
  loop wait for completion
    Manip->>CC: task_invoke(task, "get_status" or "get_state")
    CC->>Task: get_status()/get_state()
    Task-->>CC: TrajectoryStatus / None
    CC-->>Manip: status
  end
  Manip-->>Skills: streamed progress strings
  Skills-->>Agent: tool results
  Agent-->>HI: assistant response
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

9 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 1105 to 1121
try:
generate = self.get_rpc_calls("GraspingModule.generate_grasps")
result_str = generate(object_name, object_id, True)
# If generate_grasps returned actual PoseArray via the grasps port,
# we need to get the poses. For now, check if it returned an error string.
if isinstance(result_str, str) and "No" in result_str:
logger.info(f"GraspGen returned: {result_str}, falling back to heuristic")
else:
# GraspGen succeeded — get poses from the grasps port or RPC
logger.info(f"GraspGen result: {result_str}")
# Try to get the grasp poses via RPC
try:
get_grasps = self.get_rpc_calls("GraspingModule.get_latest_grasps")
grasp_poses: PoseArray | None = get_grasps()
if grasp_poses and len(grasp_poses.poses) > 0:
return list(grasp_poses.poses)
except Exception:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invalid Grasping RPC usage

_generate_grasps_for_pick() calls self.get_rpc_calls("GraspingModule.get_latest_grasps"), but rpc_calls only declares GraspingModule.generate_grasps (manipulation_module.py:120-123) and GraspingModule doesn’t implement any get_latest_grasps RPC (it publishes to an Out[PoseArray] port instead; see dimos/manipulation/grasping/grasping.py:37-106). This will raise ValueError at runtime on the “GraspGen succeeded” path.

@greptile-apps
Copy link

greptile-apps bot commented Feb 12, 2026

Additional Comments (1)

dimos/manipulation/manipulation_module.py
Polling wrong task method

This switched to client.task_invoke(..., "get_state", {}), but the trajectory task implementation exposes get_status() (returning a TrajectoryStatus) and does not implement get_state (see dimos/manipulation/control/trajectory_controller/joint_trajectory_controller.py:162-273). As written, task_invoke will return None and this method returns None, which also breaks _wait_for_trajectory_completion() (it treats None as “task not found” and returns success early at manipulation_module.py:1001-1003).

@mustafab0 mustafab0 changed the title Feature: Basic Agentic Pick and Place - reimplimented manipulation skills Feature: Basic Agentic Pick and Place - reimplemented manipulation skills Feb 12, 2026
@spomichter
Copy link
Contributor

i assume i need hardware to test this

Comment on lines +19 to +20
- Tier 2 (@skill): Individually testable actions (move_to_pose, open_gripper, etc.)
- Tier 3 (@skill): Composed behaviors (pick, place, pick_and_place)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semantially not sure the difference between how these are explained. both can be called by agents. Maybe better to clarify one long horizon and one short horizon "functions" given that there will rarely be failure during an open gripper action, but more potential failure for a full pick and place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Short - Long Horizon is a better classification.

The distinction here was just for readability and code organization.

objects: In[list[DetObject]]

# RPC calls for GraspGen integration (resolved at runtime if modules are deployed)
rpc_calls: list[str] = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be deprecated with @jeff-hykin 's RPC stuff but will let it pass for now & in interest of release timeline

self._init_joints_captured = False

# Last pick position: stored during pick so place_back() can return the object
self._last_pick_position: tuple[float, float, float] | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason these aren't typed. Makes development harder without guaranteed types / typechecking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coordinator client interfaces with hardware much loser. I will add proper type checking to the ControlCoordinator API

# Snapshotted detections from the last scan_objects/refresh call.
# The live detection cache is volatile (labels change every frame),
# so pick/place use this stable snapshot instead.
self._detection_snapshot: list[dict[str, object]] = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here also idk what object means

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been an Object. A confusion with object vs Object. will fix.

Comment on lines +947 to +955
def _make_pose(
self, x: float, y: float, z: float, roll: float, pitch: float, yaw: float
) -> Pose:
"""Construct a Pose from position and Euler angles (radians)."""
from dimos.msgs.geometry_msgs import Pose, Vector3
from dimos.utils.transform_utils import euler_to_quaternion

orientation = euler_to_quaternion(Vector3(roll, pitch, yaw))
return Pose(position=Vector3(x, y, z), orientation=orientation)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we doing this

Comment on lines +957 to +985
def _wait_for_trajectory_completion(
self, robot_name: RobotName | None = None, timeout: float = 60.0, poll_interval: float = 0.2
) -> bool:
"""Wait for trajectory execution to complete.

Polls the coordinator task state via task_invoke. Falls back to waiting
for the trajectory duration if the coordinator is unavailable.

Args:
robot_name: Robot to monitor
timeout: Maximum wait time in seconds
poll_interval: Time between status checks

Returns:
True if trajectory completed successfully
"""
robot = self._get_robot(robot_name)
if robot is None:
return True
rname, _, config, _ = robot
client = self._get_coordinator_client()

if client is None or not config.coordinator_task_name:
# No coordinator — wait for trajectory duration as fallback
traj = self._planned_trajectories.get(rname)
if traj is not None:
logger.info(f"No coordinator status — waiting {traj.duration:.1f}s for trajectory")
time.sleep(traj.duration + 0.5)
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be handled by @paul-nechifor skill coordinator stuff. We should have a universal and Generic set of "Skills" or "Processes" within dimOS and all governed by the same coordinator and manager. Fine for now since not done yet.

  1. dimOS skills are functions that agents can call
  2. dimOS processes are short-mid-long running tasks typically triggered by skill calls (or directly)

Comment on lines +1058 to +1087
def _find_object_in_detections(
self, object_name: str, object_id: str | None = None
) -> dict[str, object] | None:
"""Find an object in the detection snapshot by name or ID.

Uses the snapshot taken during the last scan_objects/refresh call,
not the volatile live cache (which changes labels every frame).

Args:
object_name: Name/label to search for
object_id: Optional specific object ID

Returns:
Detection dict with position info, or None
"""
detections = self._detection_snapshot
if not detections:
logger.warning("No detection snapshot — call scan_objects() first")
return None

for det in detections:
if object_id and str(det.get("object_id", "")) == str(object_id):
return dict(det)
name = str(det.get("name", "")).lower()
if object_name.lower() in name or name in object_name.lower():
return dict(det)

available = [str(d.get("name", "?")) for d in detections]
logger.warning(f"Object '{object_name}' not found in snapshot. Available: {available}")
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty horrible and caused by the lack of type i flagged above on this detections dict list thing. We have a detection type that you would be able to just do this in one line. No need for dict because 2DDetection has a field string so you can for any detection check detection.string

detections_list = List[2DDetection]
for det in detections_list:
     ...

And even better we just define a type 2DDetectionList of 2DDetectionDB which has search and __iter__ methods.

Only reason this may be acceptable to PR is that the above is undergoing a refactor from @leshy so idk the current state.

Comment on lines +1307 to +1331
def go_home(self, robot_name: str | None = None) -> Generator[str, None, None]:
"""Move the robot to its home/observe joint configuration.

Opens the gripper and moves to the predefined home position.

Args:
robot_name: Robot to move (only needed for multi-arm setups).
"""
robot = self._get_robot(robot_name)
if robot is None:
yield "Error: Robot not found"
return
_, _, config, _ = robot

if config.home_joints is None:
yield "Error: No home_joints configured for this robot"
return

yield "Opening gripper..."
self._set_gripper_position(0.85, robot_name)
time.sleep(0.5)

yield "Planning motion to home position..."
if not self.plan_to_joints(config.home_joints, robot_name):
yield "Error: Failed to plan path to home position"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah i assume @paul-nechifor 's stuff will solve for this generator / yielding requirement right now so fine for now

Comment on lines +1366 to +1379
def pick(
self,
object_name: str,
object_id: str | None = None,
robot_name: str | None = None,
) -> Generator[str, None, None]:
"""Pick up an object by name using grasp planning and motion execution.

Scans the scene, generates grasp poses (via GraspGen or heuristic fallback),
plans collision-free approach/grasp/retract motions, and executes them.

Args:
object_name: Name of the object to pick (e.g. "cup", "bottle", "can").
object_id: Optional unique object ID from perception for precise identification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paul-nechifor Flagging for you since skills like this are helpful to see what we're solving for on agents

@mustafab0
Copy link
Contributor Author

i assume i need hardware to test this

You only need a realsense camera to test this PR. All the robot hardware can be simulated using the coordinator-mock blueprint

@spomichter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate agent with Manipulation module Implement Manipulation Skills Reimplemnt manipulation skills

2 participants